pipeline: filters: lookup added new filter page #1953

olegmukhin · 2025-07-19T14:58:21Z

New filter documentation page outlines description, example configuration and CSV handling for the new LookUp filter to be implemented as part #fluent/fluent-bit/pull/10620.

Summary by CodeRabbit

Documentation
- Added comprehensive Lookup filter documentation and navigation entry: configuration options with defaults; examples in multiple config formats; sample input/output; metrics (processed/matched/skipped); behavior notes on case sensitivity, header handling, quotes, whitespace trimming, multiline records and data-type handling; clarification that the lookup table is loaded at startup.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

pipeline/filters/lookup.md

alexakreizinger

thanks for incorporating those changes @olegmukhin (and thank you for opening a PR in the first place—I meant to say that earlier but forgot 😅)

I'm approving this but I'll let you merge it... not sure if we're waiting on corresponding code changes before the docs are ready to go live :)

New documentation page outlines description, example configuration and CSV handling for the new LookUp filter. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>

Updated inputs based on code changes. Added metrics section. Made key considerations clearer with a separate section. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>

coderabbitai · 2025-11-24T17:27:59Z

Walkthrough

Adds comprehensive documentation for a new Lookup filter: configuration fields (data_source, lookup_key, result_key, ignore_case, skip_header_row), examples (Fluent Bit YAML and config), sample input/output, CSV matching rules, metrics, and operational notes; also updates the docs SUMMARY/TOC.

Changes

Cohort / File(s)	Summary
Lookup filter documentation `pipeline/filters/lookup.md`	Added full documentation: configuration fields with defaults, behavior (CSV loaded at startup, first column = key, second = value), case/quote/whitespace handling, skip_header_row, metrics (processed/matched/skipped), and examples (Fluent Bit YAML and Fluent Bit config) with sample input→output.
TOC / Summary entry `SUMMARY.md`	Added "Lookup" entry under Filters in the documentation SUMMARY/TOC.

Sequence Diagram(s)

(The changes are documentation-only; no control-flow or runtime behavior changes to diagram.)

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Verify documentation accurately reflects implemented filter behavior (CSV parsing, key/value rules).
Confirm examples (YAML and Fluent Bit config) and sample I/O align with actual behavior.
Check SUMMARY.md entry placement and formatting.

Poem

🐇 I nibble rows of CSV bright,
First column sings, the next takes flight.
A hop, a match, a value found,
Docs tucked soft beneath the ground.
Read along — the lookup's sound.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a new documentation page for the Lookup filter to the pipeline filters section.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dab1621 and 5600edd.

📒 Files selected for processing (1)

pipeline/filters/lookup.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pipeline/filters/lookup.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

pipeline/filters/lookup.md (1)
143-152: Fix grammar: hyphenate "key-value" in compound adjective.

Line 145 should use a hyphen to join the words in the compound adjective.

Apply this diff to fix the grammar issue:
- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
+ The CSV is used to create an in-memory key-value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
While refactoring, consider improving clarity: the phrase "All other columns in the CSV are ignored" could be stronger. Consider: "Any additional columns are ignored and not loaded."

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3002555 and e0d7761.

📒 Files selected for processing (1)

pipeline/filters/lookup.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

pipeline/filters/lookup.md

[grammar] ~145-~145: Use a hyphen to join words.
Context: ...e CSV is used to create an in-memory key value lookup table. Column 1 of the CSV ...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (6)

pipeline/filters/lookup.md (6)
1-3: Documentation intro is clear and accurate. The description succinctly explains the core lookup behavior without overcomplicating the specifics of record accessors and CSV column handling. ✓

14-15: Clarify skip_header_row parameter behavior vs. past implementation notes.

Past review comments indicated "The filter always skips the first row." However, the skip_header_row parameter now defaults to false, which would mean the first row is processed as data by default. This contradicts the earlier statement. Please clarify:

Does the filter always skip the first row regardless of the skip_header_row setting?

Or is skip_header_row a new feature that changes default behavior?

If the default should be true to match the original behavior, this should be updated.

17-79: Configuration examples are clear and comprehensive. Both YAML and config formats are shown with proper syntax and all relevant parameters. The examples demonstrate practical usage with input, filter, and output stages. ✓

81-112: Example data is comprehensive and well-chosen. The input records cover important edge cases (various data types, case variations, quoted values, whitespace, missing fields), and the CSV file demonstrates duplicate key handling and quote escaping. This effectively illustrates the filter's behavior. ✓

115-129: Expected output is missing the whitespace-only hostname record.

The input data includes a record with {"hostname": " "} (line 95), which should pass through unchanged because it doesn't match any CSV entry. However, this record is missing from the expected output (lines 118–129). Please add it to complete the example.

The missing record should appear as:
{"hostname"=>" "}
131-141: Metrics documentation is clear and complete. The three counters (processed, matched, skipped) provide good visibility into filter operation, and the note about the "name" label is helpful for monitoring. ✓

Added link from summary.md (overwritten by conflit). Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>

Minor changes in language as recommended by vale. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

pipeline/filters/lookup.md (1)
143-152: Key considerations are comprehensive; fix compound adjective hyphenation.

The key considerations section effectively documents important behaviors (whitespace trimming, data type handling, CSV loading, etc.) and aligns with the examples provided.

However, there is a minor grammar issue at line 145: "key value lookup table" should use a hyphen between the compound adjectives: "key-value lookup table."

Apply this fix:
- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
+ The CSV is used to create an in-memory key-value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 604b0b1 and dab1621.

📒 Files selected for processing (1)

pipeline/filters/lookup.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

pipeline/filters/lookup.md

[grammar] ~145-~145: Use a hyphen to join words.
Context: ...e CSV is used to create an in-memory key value lookup table. Column 1 of the CSV ...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (7)

pipeline/filters/lookup.md (7)

1-4: Clear introduction and description.

The header and opening description accurately convey the filter's purpose and behavior. This aligns well with the approved suggestions from prior review.

5-16: Configuration parameters table is clear and complete.

Parameter descriptions are detailed and align with prior reviewer suggestions, especially the clarifications around ignore_case and skip_header_row behavior. The table includes helpful examples for record accessor syntax.

17-79: Example configurations are accurate and well-structured.

Both YAML and config format examples are consistent, realistic, and properly demonstrate the filter's configuration with all key parameters.

81-96: Comprehensive input examples covering edge cases.

The input records effectively demonstrate various data types (strings, numbers, booleans, objects, arrays) and edge cases (whitespace, quotes) that the filter will encounter.

98-111: CSV example is representative and well-structured.

The CSV data effectively demonstrates various scenarios including quoted values, escaped quotes, duplicate keys, and matches for the input examples. The header row and varied data types support the documented behavior.

113-129: Verify completeness of output example and quote escaping.

The output examples are helpful, but there appear to be two issues to verify:

Missing output record: The input examples (lines 84-96) contain 12 records, but the output examples show only 10 records. The record with "hostname": " " (line 95) is missing from the output. If this record passes through unchanged (because a whitespace-only value doesn't match any CSV entry after trimming), it should still appear in the output. Please confirm whether this record should be included.

Quote escaping accuracy (line 124): The CSV contains "quoted ""host""" (line 109) with CSV-style escaped quotes (two consecutive quotes representing a single literal quote). The output shows "quoted "host"". Please verify this represents the correct filter behavior and output format.

131-141: Metrics section is clear and follows conventions.

The three metrics appropriately track filter activity (processed, matched, skipped records), follow Prometheus naming conventions, and include helpful descriptions. The name label for instance identification is good for multi-filter scenarios.

Fixed broken record assessor link Minor grammar enhancement Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>

olegmukhin requested review from a team as code owners July 19, 2025 14:58

olegmukhin mentioned this pull request Jul 19, 2025

filter_lookup: added filter for key value lookup fluent/fluent-bit#10620

Open

4 tasks